Sequence context-specific profiles for homology searching: Supplementary Information
نویسنده
چکیده
Generation of context profile library N = 1 million training profiles of length l = 2d+1 were generated as described in the main text and Figure 2. Each training profile is represented by a count profile cn(j, x), which specifies the counts of amino acid x ∈ {1, . . . , 20} at position j ∈ {−d, . . . , d}. These counts are obtained by multiplying the sequence profile tn(j, x) by the effective number of sequences Nn(j) at position j in the alignment from which training profile tn(j, x) was calculated: cn(j, x) = Nn(j)tn(j, x) (see next section for details). Here, we describe how these N profiles are clustered in order to obtain a set of K context profiles which recur frequently among the training profiles and which together can describe all training profiles. More precisely, we seek to determine context profiles p = (p1, . . . , pK) and their prior probabilities α = (α1, . . . , αK) that maximize the likelihoodP (c|p, α) that the training profile counts c = (c1, . . . , cN ) were generated by the context profiles. We model the distribution of counts cn(j, x) in each column j by a multinomial distribution. Since cn(j, x) can be real-valued, however, we replace the factorials in the multinomial distribution by Gamma functions (n! = Γ(n + 1)). The probability for context profile pk to have emitted counts cn(j, x) (j ∈ {−d, . . . , d}, x ∈ {1, . . . , 20}) is
منابع مشابه
Sequence context-specific profiles for homology searching.
Sequence alignment and database searching are essential tools in biology because a protein's function can often be inferred from homologous proteins. Standard sequence comparison methods use substitution matrices to find the alignment with the best sum of similarity scores between aligned residues. These similarity scores do not take the local sequence context into account. Here, we present an ...
متن کاملImproving protein fold recognition with hybrid profiles combining sequence and structure evolution
MOTIVATION Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since...
متن کاملProtein threading using context-specific alignment potential
MOTIVATION Template-based modeling, including homology modeling and protein threading, is the most reliable method for protein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current template-base modeling methods, especially when proteins under consideration are distantly related. RESULTS We present a novel context-specific alignmen...
متن کاملThe global trace graph, a novel paradigm for searching protein sequence databases
MOTIVATION Propagating functional annotations to sequence-similar, presumably homologous proteins lies at the heart of the bioinformatics industry. Correct propagation is crucially dependent on the accurate identification of subtle sequence motifs that are conserved in evolution. The evolutionary signal can be difficult to detect because functional sites may consist of non-contiguous residues w...
متن کاملAccelerating Information Retrieval from Profile Hidden Markov Model Databases
Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching ef...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009